Crime poses significant challenges to society, impacting community reputation, individual well-being, and overall economic growth. It affects public safety, urban development, and investment potential, making crime analysis an essential component of urban management and law enforcement strategies.
According to the 2020 U.S. Census, Los Angeles (LA), California, is the second most populous city in the United States, with a population of 3,898,747 (Editor & Tikkanen, 2024). The city's socioeconomic diversity, rapid urbanization, and presence of informal settlements contribute to crime occurrence and escalation. LA has experienced various property crimes (burglary, theft, shoplifting) and violent crimes (assault, homicide, rape, lynching), which negatively impact public perception, tourism, trade, and economic stability.
This study aims to conduct a comprehensive spatiotemporal analysis of crime trends in Los Angeles from 2020 to 2024, leveraging data-driven methodologies to extract meaningful insights. The study follows a structured workflow, incorporating:
# Installing and Importing the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objs as go
!pip install prophet
from prophet import Prophet
!pip install folium
from folium.plugins import HeatMap
from folium import features
Defaulting to user installation because normal site-packages is not writeable Looking in links: /usr/share/pip-wheels Requirement already satisfied: prophet in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (1.1.6) Requirement already satisfied: cmdstanpy>=1.0.4 in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from prophet) (1.2.4) Requirement already satisfied: numpy>=1.15.4 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from prophet) (1.26.4) Requirement already satisfied: matplotlib>=2.0.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from prophet) (3.8.0) Requirement already satisfied: pandas>=1.0.4 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from prophet) (2.1.4) Requirement already satisfied: holidays<1,>=0.25 in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from prophet) (0.61) Requirement already satisfied: tqdm>=4.36.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from prophet) (4.65.0) Requirement already satisfied: importlib-resources in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from prophet) (6.4.5) Requirement already satisfied: stanio<2.0.0,>=0.4.0 in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from cmdstanpy>=1.0.4->prophet) (0.5.1) Requirement already satisfied: python-dateutil in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from holidays<1,>=0.25->prophet) (2.8.2) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (1.2.0) Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (4.25.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (1.4.4) Requirement already satisfied: packaging>=20.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (23.2) Requirement already satisfied: pillow>=6.2.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (10.2.0) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from matplotlib>=2.0.0->prophet) (3.0.9) Requirement already satisfied: pytz>=2020.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from pandas>=1.0.4->prophet) (2023.3.post1) Requirement already satisfied: tzdata>=2022.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from pandas>=1.0.4->prophet) (2023.3) Requirement already satisfied: six>=1.5 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from python-dateutil->holidays<1,>=0.25->prophet) (1.16.0) Defaulting to user installation because normal site-packages is not writeable Looking in links: /usr/share/pip-wheels Requirement already satisfied: folium in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (0.18.0) Requirement already satisfied: branca>=0.6.0 in /home/f7e5220a-ce9c-48fa-9cf8-61771bf56fe3/.local/lib/python3.10/site-packages (from folium) (0.8.0) Requirement already satisfied: jinja2>=2.9 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from folium) (3.1.3) Requirement already satisfied: numpy in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from folium) (1.26.4) Requirement already satisfied: requests in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from folium) (2.31.0) Requirement already satisfied: xyzservices in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from folium) (2022.9.0) Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from jinja2>=2.9->folium) (2.1.3) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from requests->folium) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from requests->folium) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from requests->folium) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/anaconda-2024.02-py310/lib/python3.10/site-packages (from requests->folium) (2024.2.2)
# Loading crime data
crime_data = pd.read_csv('Crime_Data_from_2020_to_Present.csv')
crime_data.describe()
| DR_NO | TIME OCC | AREA | Rpt Dist No | Part 1-2 | Crm Cd | Vict Age | Premis Cd | Weapon Used Cd | Crm Cd 1 | Crm Cd 2 | Crm Cd 3 | Crm Cd 4 | LAT | LON | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 9.786280e+05 | 978628.000000 | 978628.000000 | 978628.000000 | 978628.000000 | 978628.000000 | 978628.000000 | 978613.000000 | 325959.000000 | 978617.000000 | 68816.000000 | 2309.000000 | 64.00000 | 978628.000000 | 978628.000000 |
| mean | 2.196564e+08 | 1338.802627 | 10.702561 | 1116.686084 | 1.404785 | 500.810635 | 29.122904 | 306.181502 | 363.815372 | 500.564847 | 958.156344 | 984.192724 | 991.21875 | 33.995399 | -118.081108 |
| std | 1.290395e+07 | 651.622947 | 6.107280 | 610.836054 | 0.490851 | 206.309796 | 21.961531 | 218.908131 | 123.673988 | 206.107451 | 110.251477 | 51.506344 | 27.06985 | 1.640056 | 5.684520 |
| min | 8.170000e+02 | 1.000000 | 1.000000 | 101.000000 | 1.000000 | 110.000000 | -4.000000 | 101.000000 | 101.000000 | 110.000000 | 210.000000 | 310.000000 | 821.00000 | 0.000000 | -118.667600 |
| 25% | 2.106073e+08 | 900.000000 | 5.000000 | 589.000000 | 1.000000 | 331.000000 | 0.000000 | 101.000000 | 311.000000 | 331.000000 | 998.000000 | 998.000000 | 998.00000 | 34.014600 | -118.430500 |
| 50% | 2.208116e+08 | 1420.000000 | 11.000000 | 1141.000000 | 1.000000 | 442.000000 | 30.000000 | 203.000000 | 400.000000 | 442.000000 | 998.000000 | 998.000000 | 998.00000 | 34.058900 | -118.322500 |
| 75% | 2.309110e+08 | 1900.000000 | 16.000000 | 1617.000000 | 2.000000 | 626.000000 | 44.000000 | 501.000000 | 400.000000 | 626.000000 | 998.000000 | 998.000000 | 998.00000 | 34.164900 | -118.273900 |
| max | 2.499253e+08 | 2359.000000 | 21.000000 | 2199.000000 | 2.000000 | 956.000000 | 120.000000 | 976.000000 | 516.000000 | 956.000000 | 999.000000 | 999.000000 | 999.00000 | 34.334300 | 0.000000 |
# Dropping irrelevant multiple columns
columns_to_drop = ['DR_NO', 'Date Rptd', 'TIME OCC', 'AREA', 'Rpt Dist No', 'Part 1-2', 'Crm Cd','Mocodes', 'Premis Cd', 'Weapon Used Cd', 'Weapon Desc', 'Status', 'Crm Cd 1', 'Crm Cd 2', 'Crm Cd 3', 'Crm Cd 4', 'Cross Street' ]
crime_data = crime_data.drop(columns=columns_to_drop)
crime_data
| DATE OCC | AREA NAME | Crm Cd Desc | Vict Age | Vict Sex | Vict Descent | Premis Desc | Status Desc | LOCATION | LAT | LON | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 03/01/2020 12:00:00 AM | Wilshire | VEHICLE - STOLEN | 0 | M | O | STREET | Adult Arrest | 1900 S LONGWOOD AV | 34.0375 | -118.3506 |
| 1 | 02/08/2020 12:00:00 AM | Central | BURGLARY FROM VEHICLE | 47 | M | O | BUS STOP/LAYOVER (ALSO QUERY 124) | Invest Cont | 1000 S FLOWER ST | 34.0444 | -118.2628 |
| 2 | 11/04/2020 12:00:00 AM | Southwest | BIKE - STOLEN | 19 | X | X | MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC) | Invest Cont | 1400 W 37TH ST | 34.0210 | -118.3002 |
| 3 | 03/10/2020 12:00:00 AM | Van Nuys | SHOPLIFTING-GRAND THEFT ($950.01 & OVER) | 19 | M | O | CLOTHING STORE | Invest Cont | 14000 RIVERSIDE DR | 34.1576 | -118.4387 |
| 4 | 08/17/2020 12:00:00 AM | Hollywood | THEFT OF IDENTITY | 28 | M | H | SIDEWALK | Invest Cont | 1900 TRANSIENT | 34.0944 | -118.3277 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 978623 | 07/23/2024 12:00:00 AM | Wilshire | VEHICLE - STOLEN | 0 | NaN | NaN | STREET | Invest Cont | 4000 W 23RD ST | 34.0362 | -118.3284 |
| 978624 | 01/15/2024 12:00:00 AM | Central | VANDALISM - MISDEAMEANOR ($399 OR UNDER) | 0 | X | X | HOTEL | Invest Cont | 1300 W SUNSET BL | 34.0685 | -118.2460 |
| 978625 | 07/19/2024 12:00:00 AM | Devonshire | TRESPASSING | 0 | X | X | MTA - ORANGE LINE - CHATSWORTH | Invest Cont | 10000 OLD DEPOT PLAZA RD | 34.2500 | -118.5990 |
| 978626 | 04/24/2024 12:00:00 AM | Southwest | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT | 70 | F | W | SIDEWALK | Invest Cont | FLOWER ST | 34.0215 | -118.2868 |
| 978627 | 08/12/2024 12:00:00 AM | Van Nuys | VEHICLE - STOLEN | 0 | NaN | NaN | PARKING LOT | Invest Cont | 6900 VESPER AV | 34.1961 | -118.4510 |
978628 rows × 11 columns
#Dropping the NaN values
crime_data = crime_data.dropna()
#Printing all the counts of NaN values
na_counts = crime_data.isna().sum()
print(na_counts)
DATE OCC 0 AREA NAME 0 Crm Cd Desc 0 Vict Age 0 Vict Sex 0 Vict Descent 0 Premis Desc 0 Status Desc 0 LOCATION 0 LAT 0 LON 0 dtype: int64
crime_data = crime_data.copy()
# Defining categories based on keywords
def categorize_crime(description):
if any(keyword in description for keyword in ['ASSAULT', 'BATTERY']):
return 'Assault/Battery'
elif any(keyword in description for keyword in ['BURGLARY', 'THEFT', 'SHOPLIFTING', 'STOLEN']):
return 'Theft/Burglary'
elif any(keyword in description for keyword in ['ARSON']):
return 'Arson'
elif any(keyword in description for keyword in ['CHILD', 'PORNOGRAPHY']):
return 'Child-related Crimes'
elif any(keyword in description for keyword in ['ROBBERY']):
return 'Robbery'
elif any(keyword in description for keyword in ['FRAUD', 'EMBEZZLEMENT']):
return 'Fraud/Financial Crimes'
elif any(keyword in description for keyword in ['FIREARMS', 'WEAPONS', 'SHOTS']):
return 'Firearm/Weapon Offense'
else:
return 'Other'
# Applying the categorization
crime_data.loc[:, 'Crime Category 1'] = crime_data['Crm Cd Desc'].apply(categorize_crime)
# Displaying the crime category
print(crime_data['Crime Category 1'].unique())
['Theft/Burglary' 'Assault/Battery' 'Other' 'Child-related Crimes' 'Robbery' 'Fraud/Financial Crimes' 'Arson' 'Firearm/Weapon Offense']
# Counting the records for each crime category
crime_category_counts = crime_data['Crime Category 1'].value_counts()
# Plotting a bar chart
plt.figure(figsize=(10, 6))
bar_chart = crime_category_counts.plot(kind='bar', color='skyblue', edgecolor='black')
# Adding titles and labels
plt.title('Count of Records by Crime Category', fontsize=16)
plt.xlabel('Crime Category', fontsize=14)
plt.ylabel('Record Count', fontsize=14)
plt.xticks(rotation=45, ha='right', fontsize=12)
for index, value in enumerate(crime_category_counts):
plt.text(index, value + 5, str(value), ha='center', fontsize=10, color='black')
plt.tight_layout()
plt.show()
# Categorizing 'Vict Age' into 10-year intervals
age_bins = list(range(0, 101, 10))
labels = [f'{i}-{i+9}' for i in age_bins[:-1]]
crime_data['Age Category'] = pd.cut(crime_data['Vict Age'], bins=age_bins, labels=labels, right=False)
# Counting the number of crimes per age category
age_category_count = crime_data['Age Category'].value_counts().sort_index()
# Ploting the results
plt.figure(figsize=(10, 6))
age_category_count.plot(kind='bar', color='skyblue')
plt.title('Number of victims by Age Category')
plt.xlabel('Age Category')
plt.ylabel('Crime Count')
plt.xticks(rotation=45)
for index, value in enumerate(age_category_count):
plt.text(index, value + 5, str(value), ha='center', fontsize=10, color='black')
plt.show()
# Counting the number of crimes for each victim sex
victim_sex_count = crime_data['Vict Sex'].value_counts()
# Plotting the results
plt.figure(figsize=(8, 5))
victim_sex_count.plot(kind='bar', color='skyblue')
plt.title('Number of Crimes Vs Victim Sex')
plt.xlabel('Victim Sex')
plt.ylabel('Crime Count')
plt.xticks(rotation=0)
for index, value in enumerate(victim_sex_count):
plt.text(index, value + 5, str(value), ha='center', fontsize=10, color='black')
plt.show()
# Grouping the data by day of the week and count the number of incidents on each day
crime_data['DATE OCC'] = pd.to_datetime(crime_data['DATE OCC'], format='%m/%d/%Y %I:%M:%S %p', errors='coerce')
crime_data['Day_of_Week'] = crime_data['DATE OCC'].dt.day_name()
# Count the records for each crime category
crime_category_counts = crime_data['Day_of_Week'].value_counts()
# Plotting a bar chart
plt.figure(figsize=(10, 6))
bar_chart = crime_category_counts.plot(kind='bar', color='skyblue', edgecolor='black')
# Adding titles and labels
plt.title('Crimes by Day of the Week', fontsize=16)
plt.xlabel('Day of Week', fontsize=14)
plt.ylabel('Number of Incidents', fontsize=14)
plt.xticks(rotation=45, ha='right', fontsize=12)
for index, value in enumerate(crime_category_counts):
plt.text(index, value + 5, str(value), ha='center', fontsize=10, color='black')
plt.tight_layout()
plt.show()
# Converting date column to datetime format
crime_data['Year'] = crime_data['DATE OCC'].dt.year
crime_data['Year-Month'] = crime_data['DATE OCC'].dt.to_period('M')
# Grouping by 'Year-Month' and count the incidents for each period
monthly_crime_counts = crime_data.groupby('Year-Month').size()
# Plotting the number of crime incidents by Year-Month
plt.figure(figsize=(12, 6))
monthly_crime_counts.plot(kind='line', marker='o', color='skyblue', linewidth=2)
# Adding labels and title
plt.xlabel('Year-Month')
plt.ylabel('Number of Incidents')
plt.title('Number of Crime Incidents by Year-Month')
# Customizing the plot
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
# Displaying the plot
plt.show()
#Grouping data by year-month to get monthly crime counts
#and defining year_month as 'ds' and Crime_count as 'y'
monthly_crime_count = crime_data.groupby('Year-Month').size().reset_index(name='Count')
monthly_crime_count['Year-Month'] = pd.to_datetime(monthly_crime_count['Year-Month'].astype(str))
monthly_crime_count.columns = ['ds', 'y']
monthly_crime_count.head()
| ds | y | |
|---|---|---|
| 0 | 2020-01-01 | 16725 |
| 1 | 2020-02-01 | 15586 |
| 2 | 2020-03-01 | 14309 |
| 3 | 2020-04-01 | 13505 |
| 4 | 2020-05-01 | 14915 |
# Initializing and fit the Prophet model
model = Prophet()
model.fit(monthly_crime_count)
# Creating a DataFrame for future predictions (remaining months of 2024)
future = model.make_future_dataframe(periods=4, freq='M')
forecast = model.predict(future)
# Displaying the forecast results
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(13)
05:59:10 - cmdstanpy - INFO - Chain [1] start processing 05:59:11 - cmdstanpy - INFO - Chain [1] done processing
| ds | yhat | yhat_lower | yhat_upper | |
|---|---|---|---|---|
| 48 | 2024-01-01 | 14885.576198 | 10795.613319 | 18773.847838 |
| 49 | 2024-02-01 | 13425.375931 | 9721.432351 | 17479.876831 |
| 50 | 2024-03-01 | 13911.225252 | 9979.239810 | 17942.995117 |
| 51 | 2024-04-01 | 12776.488784 | 8773.805889 | 16715.204070 |
| 52 | 2024-05-01 | 12728.544858 | 9035.595705 | 16688.040021 |
| 53 | 2024-06-01 | 12612.610423 | 8822.706444 | 16479.590835 |
| 54 | 2024-07-01 | 13020.227321 | 9299.558642 | 16832.479950 |
| 55 | 2024-08-01 | 12634.156872 | 8841.442804 | 16464.512741 |
| 56 | 2024-09-01 | 10627.285262 | 6695.002871 | 14799.672453 |
| 57 | 2024-09-30 | 12226.569253 | 8299.856286 | 15939.147945 |
| 58 | 2024-10-31 | 13068.577006 | 9148.468207 | 16879.194605 |
| 59 | 2024-11-30 | 15504.770064 | 11553.225510 | 19377.436226 |
| 60 | 2024-12-31 | 14322.701267 | 10018.081385 | 18105.088780 |
# Plotting traces for actual and forecasted data
trace_actual = go.Scatter(
x=monthly_crime_count['ds'],
y=monthly_crime_count['y'],
mode='lines+markers',
name='Actual Crime Count',
marker=dict(color='blue'),
line=dict(color='blue', width=2)
)
trace_forecast = go.Scatter(
x=forecast['ds'],
y=forecast['yhat'],
mode='lines',
name='Forecasted Crime Count',
marker=dict(color='red'),
line=dict(color='red', width=2)
)
trace_upper = go.Scatter(
x=forecast['ds'],
y=forecast['yhat_upper'],
mode='lines',
name='Forecast Upper Bound',
line=dict(color='grey', dash='dash'),
showlegend=False
)
trace_lower = go.Scatter(
x=forecast['ds'],
y=forecast['yhat_lower'],
mode='lines',
name='Forecast Lower Bound',
line=dict(color='grey', dash='dash'),
fill='tonexty', # Fill between the upper and lower bounds
fillcolor='rgba(128, 128, 128, 0.2)',
showlegend=False
)
# Combining the traces into a figure
data = [trace_actual, trace_forecast, trace_upper, trace_lower]
# Layout customization
layout = go.Layout(
title='Actual vs Forecasted Monthly Crime Count',
xaxis=dict(title='Date'),
yaxis=dict(title='Crime Count'),
hovermode='closest'
)
# Creating the figure and plot
fig = go.Figure(data=data, layout=layout)
fig.show()
# Checking if 'LAT' and 'LON' columns are numeric and adding them to a list
crime_data['LAT'] = pd.to_numeric(crime_data['LAT'], errors='coerce')
crime_data['LON'] = pd.to_numeric(crime_data['LON'], errors='coerce')
crime_locations = crime_data[['LAT', 'LON']].values.tolist()
# Defining map bounds for LA area
min_lat, max_lat = 33.5, 34.5
min_lon, max_lon = -119.0, -117.5
bounds = [[min_lat, min_lon], [max_lat, max_lon]]
# Initializing a Folium map centered on LA with fixed zoom limits
map = folium.Map(
location=[34.05, -118.25],
zoom_start=11,
min_zoom=10,
max_zoom=14,
max_bounds=True
)
# Adding the heat map layer
HeatMap(crime_locations, radius=10, blur=10, max_zoom=13, opacity=0.2).add_to(map)
# Adding a custom legend
legend_html = """
<div style="
position: fixed;
bottom: 50px;
left: 50px;
width: 200px;
height: 120px;
background-color: white;
border:2px solid grey;
z-index:9999;
font-size:14px;
padding: 10px;
">
<b>Crime Density Legend</b><br>
<i style="background: blue; width: 10px; height: 10px; display: inline-block;"></i> Low Density<br>
<i style="background: green; width: 10px; height: 10px; display: inline-block;"></i> Medium Density<br>
<i style="background: orange; width: 10px; height: 10px; display: inline-block;"></i> High Density<br>
<i style="background: red; width: 10px; height: 10px; display: inline-block;"></i> Very High Density<br>
</div>
"""
map.get_root().html.add_child(features.Element(legend_html))
# Adding a custom title
title_html = """
<div style="
position: fixed;
top: 10px;
left: 50%;
transform: translateX(-50%);
z-index: 1000;
background-color: white;
padding: 10px;
font-size: 20px;
font-weight: bold;
border: 2px solid grey;
border-radius: 5px;
">
Crime Heatmap of Los Angeles
</div>
"""
map.get_root().html.add_child(features.Element(title_html))
# Displaying the map
map